Fault Localization Using Textual Similarities

نویسندگان

  • Zachary P. Fry
  • Westley Weimer
چکیده

Maintenance is a dominant component of software cost, and localizing reported defects so that they can be fixed is a significant component of maintenance. The size and complexity of contemporary systems makes such fault localization difficult, however. In addition, defect reports often contain incomplete information provided by users who may be unfamiliar with the code base. We propose a lightweight and scalable approach that leverages the natural language present in both defect reports and source code to identify portions of the program that are potentially related to the bug in question. Our technique is language independent and does not require test cases. The approach represents defect reports and source files as separate structured document forms and ranks source files of interest based on a document similarity metric that leverages inter-document relationships. We evaluate the fault-localization accuracy of our method against both lightweight baseline techniques and also reported results from stateof-the-art tools. Similar tools have been evaluated using a metric that quantifies the reduction of the overall search space when trying to locate faults. Given information from actual bug reports and their real-world fixes, we utilize a similar metric to gauge the effectiveness of our tool. In an empirical evaluation of 5345 historical defects from three realworld programs totaling 6.5 million lines of code, our approach reduced the number of files inspected per defect by 88%. Additionally, we qualitatively and quantitatively examine the utility of the textual and surface features used by our approach and their implications on conventional defect reporting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مکانیابی خطاهای پنهان نرم افزار با استفاده از آنتروپی متقاطع و مدلهای n-گرام

The aim is to automate the process of bug localization in program source code. The cause of program failure could be best determined by comparing and analyzing correct and incorrect execution paths generated by running the instrumented program with different failing and passing test cases. To compare and analysis the execution paths, one approach is clustering the paths according to their simil...

متن کامل

Evaluating & improving fault localization techniques

Most fault localization techniques take as input a faulty program, and produce as output a ranked list of suspicious code locations at which the program may be defective. When researchers propose a new fault localization technique, they typically evaluate it on programs with known faults. The technique is scored based on where in its output list the defective code appears. This enables the comp...

متن کامل

A general noise-reduction framework for fault localization of Java programs

Context: Existing fault-localization techniques combine various program features and similarity coefficients with the aim of precisely assessing the similarities among the dynamic spectra of these program features to predict the locations of faults. Many such techniques estimate the probability of a particular program feature causing the observed failures. They often ignore the noise introduced...

متن کامل

Fault Identification using end-to-end data by imperialist competitive algorithm

Faults in computer networks may result in millions of dollars in cost. Faults in a network need to be localized and repaired to keep the health of the network. Fault management systems are used to keep today’s complex networks running without significant cost, either by using active techniques or passive techniques. In this paper, we propose a novel approach based on imperialist competitive alg...

متن کامل

Textual Similarities Based on a Distributional Approach

The design of efficient textual similarities is an important issue in the domain of textual data exploration. Textual similarities are for example central in document collection structuring (e.g. clustering), or in Information Retrieval (IR) which relies on the computation of textual similarities for measuring the adequacy between a query and documents. The objective of this paper is to present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1211.2858  شماره 

صفحات  -

تاریخ انتشار 2010